XML-Structured Documents: Retrievable Units and Inheritance
نویسندگان
چکیده
We consider the retrieval of XML-structured documents, and of passages from such documents, defined as elements of the XML structure. These are considered from the point of view of passage retrieval, as a form of document retrieval. A retrievable unit (an element chosen as defining suitable passages for retrieval) is a textual document in its own right, but may inherit information from the other parts of the same document. Again, this inheritance is defined in terms of the XML structure. All retrievable units are mapped onto a common field structure, and the ranking function is a standard document retrieval function with a suitable field weighting. A small experiment to demonstrate the idea, using INEX data, is described. Reprinted from: H. L. Larsen, G. Pasi, D. Ortiz-Arroyo, T. Andreasen and H. Christiansen (eds), Flexible query answering systems, Milan, June 2006. Springer, 2006. (pp 121-132).
منابع مشابه
Comparing XML-IR Query Formation Interfaces
XML information retrieval (XML-IR) systems differ from traditional information retrieval systems by using structure of XML documents to retrieve more specific units of information than the documents themselves. Users interact with XML-IR systems via structured queries that express their content and structural requirements. Historically, it has been common belief within the XML-IR community that...
متن کاملخوشهبندی فراابتکاری اسناد فارسی اِکساِماِل مبتنی بر شباهت ساختاری و محتوایی
Due to the increasing number of documents, XML, effectively organize these documents in order to retrieve useful information from them is essential. A possible solution is performed on the clustering of XML documents in order to discover knowledge. Clustering XML documents is a key issue of how to measure the similarity between XML documents. Conventional clustering of text documents using a do...
متن کاملA Model for Representing and Retrieving Heterogeneous Structured Documents Based on Evidential Reasoning
Documents often display an internal structure; they are composed of components. For example, a journal contains several articles, which themselves contain paragraphs, tables, etc. With structured documents, the retrievable units should be the document components as well as the whole document. The components of a structured document can be of different types: various media, located in a number o...
متن کاملTopic Field Selection and Smoothing for XML Retrieval
Information retrieval from XML documents offers an opportunity to go below the document level in search of relevant information, making any element of an XML document a retrievable unit. We consider two dimensions along which we compare this element retrieval task with the traditional document retrieval task. We investigate how different topic representations and language model smoothing approa...
متن کاملNATIVE XML DATABASES vs. RELATIONAL DATABASES IN DEALING WITH XML DOCUMENTS
When dealing with data-centric XML documents, it is possible to convert XML documents into a relational database, which can then be queried using SQL. Such relational databases are called XML-enabled databases. On the other hand, the best choice for storing, updating and retrieving document-centric XML documents is usually a native XML database (NXD). NXDs store XML documents as logical units, ...
متن کامل